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Abstract — A generalized-statistics variational principle for 
source separation is formulated by recourse to Tsallis' entropy 
subjected to the additive duality and employing constraints 
described by normal averages. The variational principle is 
amalgamated with Hopfleld-like learning rules resulting in an 
unsupervised learning model. The update rules are formulated 
with the aid of g-deformed calculus. Numerical examples exem- 
plify the efficacy of this model. 

I. Introduction 

Recent studies have suggested that minimization of the 
Helmholtz free energy in statistical physics [1] plays a central 
role in understanding action, perception, and learning (see [2] 
and the references therein). In fact, it has been suggested 
that the principle of free energy minimization is even more 
fundamental than the redundancy reduction principle (also 
known as the principle of efficient coding) articulated by 
Barlow [3] and later formalized by Linsker as the Infomax 
principle [4]. Specifically, the principle of efficient coding 
states that the brain should optimize the mutual information 
between its sensory signals and some parsimonious neuronal 
representations. This is identical to optimizing the parameters 
of a generative model to maximize the accuracy of predictions, 
under complexity constraints. Both are mandated by the free- 
energy principle, which can be regarded as a probabilistic 
generalization of the Infomax principle. 

The Infomax principle has been central to the develop- 
ment of independent component analysis (ICA) and the allied 
problem of blind source separation (BSS) [5]. Within the 
ICA/BSS context, very few models based on minimization of 
the free energy exist, the most prominent of them originated 
by Szu and co-workers (eg. see Refs. [6,7]) to achieve source 
separation in remote sensing (i.e. hyperspectral imaging (HSI)) 
using the maximum entropy principle. The ICA/BSS problem 
may be summarized in terms of the relation 



As: 



(1) 



where s is the unknown source vector to be extracted, A is the 
unknown mixing matrix (also known as reflectance matrix or 
material abundance matrix in HSI), and x is the known vector 
of observed data. The Helmholtz free energy is described 
within the framework of Boltzmann-Gibbs-Shannon (B-G-S) 
statistics as 

F{T) = {/ - keTS, (2) 
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where T is the thermodynamic temperature (or haemostatic 
temperature in the parlance of cybernetics), fc^ the Boltzmann 
constant, U the internal energy, and S Shannon's entropy. 
A more principled and systematic manner in which to study 
free energy minimization within the context of the maximum 
entropy principle (MaxEnt) is by substituting the minimization 
of the Helmholtz free energy principle with the maximizing 
of the Massieu potential [8] 

$(/3) = S-PU, (3) 

where f3 — is the inverse thermodynamic temperature. 
The Massieu potential is the Legendre transform of the 
Helmholtz free energy, i.e.: $ = 

The generalized (also, interchangeably, nonadditive, de- 
formed, or nonextensive) statistics of Tsallis has recently been 
the focus of much attention in statistical physics, complex sys- 
tems, and allied disciplines [9]. Nonadditive statistics suitably 
generalizes the extensive, orthodox B-G-S one. The scope of 
Tsallis statistics has lately been extended to studies in lossy 
data compression in communication theory [10] and machine 
learning [11,12]. 

It is important to note that power law distributions like the q- 
Gaussian distribution cannot be accurately modeled within the 
B-G-S framework [9]. One of the most commonly encountered 
source of q-Gaussian distributions occurs in the process of 
normalization of measurement data using Studentization tech- 
niques [13]. q-Gaussian behavior is also exhibited by ellip- 
tically invariant data, which generalize spherically symmetric 
distributions. q-Gaussian's are also an excellent approximation 
to correlated Gaussian data, and other important and funda- 
mental physical and biological processes (for example, see 
[14] and the references therein). 

This paper intends to accomplish the following objectives: 

> (i) to formulate and solve a variational principle for 
source separation using the maximum dual Tsallis en- 
tropy with constraints defined by normal averages expec- 
tations, 

> (m) to amalgamate the variational principle with 
Hopfield-like learning rules [15] to acquire information 
regarding unknown parameters via an unsupervised learn- 
ing paradigm, 

> [iii) to formulate a numerical framework for the gener- 
alized statistics unsupervised learning model and demon- 



strate, with the aid of numerical examples for separation 
of independent sources (endmembers), the superiority of 
the generaUzed statistics source separation model vis-d- 
vis an equivalent B-G-S model for a single pixel. 

It is important to note that by amalgamating the information- 
theoretic model with the Hopfield model, [A] acquires the role 
of the Associative Memory (AM) matrix. Further, employing 
a Hopfield-like learning rule renders the model presented in 
this paper readily amenable to hardware implementation using 
Field Programmable Gate Arrays (FPGA's). 

The additive duality is a fundamental property in gener- 
alized statistics [9]. One implication of the additive duality 
is that it permits a deformed logarithm defined by a given 
nonadditivity parameter (say, q) to be inferred from its dual 
deformed logarithm parameterized by: q* = 2 — q. This paper 
derives a variational principle for source separation using the 
dual Tsallis entropy using normal averages constraints. This 
approach has been previously utilized (for eg. Ref. [16]), and 
possess the property of seamlessly yielding a g* -deformed 
exponentiid form on variational extremization. 

An important issue to address concerns the manner in which 
expectation values are computed. Of the various forms in 
which expectations may be defined in nonextensive statis- 
tics has, only the linear constraints originally employed by 
Tsallis [9] (also known as normal averages) of the form: 
{A) = J2Pi^i> h^^s been found to be physically satisfactory 

i 

and consistent with both the generalized H-theorem and the 
generahzed Stosszahlansatz (molecular chaos hypothesis) [17, 
18]. A re-formulation of the variational perturbation approxi- 
mations in nonextensive statistical physics followed [18], via 
an application of g-deformed calculus [19]. Results from the 
study in Ref. [19] have been successfully utilized in Section 
IV of this paper. 

This introductory Section is concluded by briefly describing 
the suitability of employing a generalized statistics model to 
study the source separation problem. First, in the case of 
remote sensing applications, and even more so in the case of 
HSI, the observed data are highly correlated, even in the case 
of a single pixel. Next, the observed data are required to be 
normalized (scaled). The Studentization process is one of the 
most prominent methods utilized to normalize the observed 
data [20,21]. Both these features lead to an excursion from 
the Gaussian framework (B-G-S statistics) and result in q- 
Gaussian pdf's characterized by the g-deformed exponential: 
expq{—x) = [1 — (1 — g) x]^, which maximizes the Tsallis 
entropy. 



II. Theoretical preliminaries 

The Section introduces the essential concepts around which 
this communication revolves. The TsalUs entropy is defined as 
19] 



The q-deformed logarithm and the q-deformed exponential are 
defined as [9, 19] 



Ing (a;) 
and, 



l-q 



exn (x) = | [l + (l-g)x]-^;l + (l-g)a.>0, 
^ [ 0; otherwise 

(5) 

Note that as g — )■ 1, (4) acquires the form of the equiva- 
lent B-G-S entropies. Likewise in (5), \nq{x) — ;> ln(x) and 
cxpg{x) cxp(.t). The operations of q-deformed relations 
are governed by q-deformed algebra and q-deformed calculus 
1 19]. Apart from providing an analogy to equivalent expres- 
sions derived from B-G-S statistics, q-deformed algebra and 
q-deformed calculus endow generalized statistics with a unique 
information geometric structure. The q-deformed addition ®q 
and the q-deformed subtraction 0g are defined as [19] 

X ®qy = X + y + [l - q) xy, 

e^y = T+(T^j_i + (1 - > (6) 

The q-deformed derivative, is defined as |19] 
ZP-nx)=lim /(l-^(^) =[l + (l-g).].-^^(-) 



y^x xQqy 



dx 



(7) 



Sq{X) = -j:p{xy\nqp{x). 



(4) 



As g ->■ 1, DqF (x) — > dF{x)/dx, the Newtonian derivative. 
The Leibnitz rule for deformed derivatives [19] is 

[A (x) B {x)] = B [x) D^A [x) + A {x) D^B [x) . (8) 

Re-parameterizing (5) via the additive duality [10]: g* = 
2 — g, yields the dual deformed logarithm and exponential 

In,,, {x) = - In, (i) , and, exp^. {x) = ^^^^^y (9) 
The dual TsalUs entropy is defined by [10, 16] 

5,. (X) = -^p(x)Vp(x). (10) 

X 

Here, In,, (a;) = ^-jzi^f^. The dual Tsallis entropies ac- 
quire a form similar to the B-G-S entropies, with In,. (•) 
replacing ln(»). 

III. Variational principle 
Consider the Lagrangian 

TV N 

[sj] = -J2sj In,. Si - E E A« i^ijSj - x^) 

3 i=lj=l 

[n \ (11) 

+Ao( E^i-lj, 

subject to the component-wise constraints 

JV N 

y~^Sj = 1, and AjjSj = Xi. (12) 



Clearly, the RHS of the Lagrangian (11) is the g*-deformed 
Massieu potential: ^g^-^]' subject to the normalization con- 
straint on Sj. The variational extremization of (11), performed 
using the Ferri-Martinez-Plastino methodology [22], leads to 



Clearly, A in (19) is a g*-deformed Massieu potential. 
By substituting (18) into (14) we arrive at 



(2-g*) l-q" 



N 

J2 KAij + Ao = 

i=l 

N 



N N ^ 
j=l i=l 

Ao 



-An 



(20) 



Ao 



' -Ao + E A«A 

i=l 



(13) 



(2-9*)- 



Multiplying the second relation in (13) by Sj and sunmiing 
over aU j, yields after application of the normalization condi- 
tion in (12) 



Again, Ao in (20) is a g*-deformed Massieu potential: [A]. 
We wish to relate Ao and Zg*. To this end, comparison of (19) 
and (20) yields 



Ao = -Ao + (ji^ = -jT^ + 



N N 



V ^ / j=l i=l 



{l-q*)Xo 



; Ao = 



Ao 
(2-9*)' 



(21) 



N 



where: Kg. = E ^? * > substituting (14) into the third 
relation in (13) yields 



N N N 

«r + (1 - a*) E E AiA,,s,- - (1 - q*) E XiAj 

j=l i=l i=l 



SO that, by substituting (18) into (15) and then invoking (20) 
we get 



Si = 



.i=l 



(22) 



Ao = — Ao 



A^ 



Ai 



(1-9*) • 

'Here, (22) is re-defined with the aid of (20) as 



(2-9*)- 

Eq. (15) yields after some algebra 



(15) 



i-Ci-g") E A* A. 



where 



[l-(l-g*)Ao]^^ 



Si = 



V+(l-9*) E E AiAys 

3 = 1 i=l 



i 1 



(16) A* = 



AT 



l-(l-9*)Ao 



^9* = E 



i=i 



l-(l-9*) E A* A., 



AT 

1-(1-9*)E A*A,, 

i=l 



(23) 



where 
A* 



With the aid of (21), (22) is re-cast in the form 



E E XiAijs/ 

3 = 1 i=l 



expg. ^ - 52 A*Aij 



and, 

/ N N ^ 

U9* + (i-9*)E EA^A,,s, 



(17) 



= V. 



[(1-9- )Ao]^ _ (24) 

Where, A, = j^,Xo = ji^, ~X* = J^^^^- 
Finally, invoking the normalization of sj, (24) yields 



Here Zg, is the canonical partition function, where: Zg* = 
E c^xp^. I — J2 K^-ij ). The dual TsaUis entropy takes the 
form 



(l-«*)Ao 



N 

E 



AT 



N 



59*W = fe^;E«. = i 



ng.=l + {q*-l)Sg4s] 



(18) 



(25) 

Note the self- referential nature of (23) in the sense that: X* 
(defined in (20) and (23) is a function of Aq. The Lagrange 
multiplier A* is henceforth defined in this paper as the dual 
normalized Lagrange force multiplier. 



Substituting now (18) into the expression for: Zg» in (17) 
results in 



IV. Unsupervised Learning rules 



/ s N N 

(t:) = W -EE ^iAijS^ = *9 



(19) 



The process of unsupervised learning is amalgamated to 
the above information theoretic structure via a Hopfield-like 
learning rule to update the AM matrix [A] in the case of a 



perturbation Aa;^ of the observed data 



dt 



(l-g*)Ao 



ln„ 



l-(l-g*)Ao 

(l-9*)Ao 



Ao 



ln„ 



Ao 



where, 



Si = 



N 

(1 - g*) E A* A,,- 

i=l 

■ + (1 - 9*) E KAij 

i=l 



At: 



(2-9*)Ao' 



(26) 

which is obtained from the first relation in (13) and (24). 
Gradient ascent along with (24) originates the second learning 
rule 



dxj _ d^l* [sj] 
dt dAii 



where, $*, [sj 



At: 



(27) 



(l-«*)Ao' 



In (26) and (27), [Sj] is the LHS of the Lagrangian (11). 

Now, a critical update rule is that for the change in the 
dual normalized Lagrange force multipliers A* resulting 
from a perturbation Axj in the observed data. Usually (as 
stated within the context of the B-G-S framework), such an 
update would entail a Taylor-expansion yielding up to the 



JV 



first order: Axj = E ff^" ^A^. Such an analysis is vahd 

k—l ^ 

only for distributions characterized by the regular exponential 
exp{—x). For probability distributions characterized by q- 
deformed exponentials, i.e., the ones we face here, such a 
perturbation treatment would lead to un-physical results [18]. 

Thus, following the prescription given in Ref. [18], for 
a function: F(r) = E-^C"^") th^ chain rule yields: 

n 

dFjr) _ dF(r) dr 



tive: 



Thus, replacing the Newtonian deriva- 



dFjr) 
dr 



by the q* -deformed one defined by (7) (see 



Ref. [19]): DI,F{t) = [l + (l-g*)T] 



dFjr) 
dr 



and defining: 



DJ„F(t)-^ = Sa' tF (t) as well, facihtates the desired 

transformation: '^Jfr-' 6q* .tF (t). Consequently, the update 

rule for is re-formulated via g-deformed calculus in the 
fashion 



Ax4 



N 

E 

k=l 



N 



N 

E 

fe=i 



N 



Additionally, setting: —Aik^l = r in (23) leads to 
[l + (l-g*)r]^ 



Si = 



AA^ 

(28) 



(29) 



Employing at this stage the Leibnitz rule for -deformed 
derivatives (and replacing q by q* in (8)), the term within 
square parenthesis RHS in (28) yields 



£ Dl,A,,s, = E IftD^, [1 + (1 - q*)T]^ 



(30) 



a relation that, after expansion turns into 

N 

E D'^'AjiSi 

JV 



= E{|^[i + (i-9*)T]f,|:[i + (i-g*)r]^ 

i=l ^ 

+Aj, [i + (i-9*)T]T^i)-. 

= E {-^[l + {l-q*)r]^ Ai, 
i=l I- '^i* 

-A,i [1 + (1 - q*) r]T^ [1 + (1 - q*) r] ^J~.''-§f} 



N 



— ^3 -^jiSiA^i^ 
i=l 

JV „ JV , 

+ E^.i "+^'l^^' E + 

i=l «* fe=l 9* 

AT 

— ^ J AjiSiA-ii^ -\- XjX}^^ 
i=l 

(31) 

Finally, the update rule for A^ with respect to Axj adopts the 
appearance 

JV / JV \ 

Aa^i = X] I ^'i^'*: ^ E ^iis»^«fc ) AAfc. (32) 

fe=l V i=l / 

V. Numerical computations 

The procedure for our double recursion problem is summa- 
rized in the pseudo-code below 

Algorithm 1 Generalized Statistics Source Separation Model 

(1.) Input: {i). Observed data: x, {ii). Trial values of 
dual normalized Lagrange force multipliers: A*, [Hi). Dual 
nonadditive parameter: q* . 
(2.) Initialization: 

Obtain Af^ ixom:Af^ — Xiaq*{xj)+ 50 % random noise 
to break any rank-1 singularity. The g*-deformed sigmoid 

logistic function is: (Ta'ixi) = rr-, ^ r. 

(3.) First Recursion 
{i) Compute: zf? from (23), 
(m) Compute:A(°) from (21), 
{Hi) Compute: sf\ \f\ and a{,"' from (23)/(24), 



^Known 



„(0) 



{iv) Compute: x^^' from (5), thus: Ax^p = x 
{v) Compute AX*^^^ by inverting (32), 

(vi) Compute next estimate: A^^^^ = A^^°^ -|- AA^'°^ . 
(4.) Second Recursion 

(vii) Compute improved estimate of : A-j^ from 
(26) by setting At = 1 and solving :Axj = 



i-(i-9-)Ar 

(l-9*)A<'» 



(5.) Go to (3.) 



^+(i-.*)ea:^H^^ 

-^0 i=i 



Following the procedure outlined in 
the above pseudo-code, values of A* = 

[0.6228, 0.6337, 0.4577, 0.1095, 0.7252, 0.01752, 0.4128] and 
X = [0.5382,0.1023,0.6404,0.4358,0.0278,0.2425,0.3299] 



are provided. These values are the same as those in Ref. [7] 
and constitute experimentally obtained Landsat data for a 
single pixel. The difference between the generalized statistics 
model presented in this paper and the B-G-S model of [6,7] 
lies in the fact that the former has initial inputs of A*'s, 
whereas the latter merely has initial inputs of A's (a far 
simpler case). The self-rerentiality in (23) mandates use of 
X* 's as the primary operational Lagrange multiplier. Note 
that the correlation coefficient of x^"""^"" is unity, a signature 
of highly correlated data. A value of q* — 0.75 is chosen. 
Figure 1 and Figure 2 depict, vs. the number of iterations, 
the source separation for the generalized statistics model and 
for the B-G-S model, respectively. Values of x are denoted by 
"o"'s. It is readily appreciated that the generalized statistics 
exhibits a more pronounced source separation than the B-G-S 
model. Owing to the highly correlated nature of the observed 
data, such results are to be expected. 

VI. Summary and discussion 

A generalized statistics model for source separation that em- 
ploys an unsupervised learning paradigm has been presented in 
this communication. This model is shown to exhibit superior 
separation performance as compared to an equivalent model 
derived within the B-G-S framework. Our encouraging results 
should inspire future work studies on the implications of first- 
order and second-order phase transitions of the Massieu poten- 
tial. One would wish for a self-consistent scheme enabling one 
to obtain self-consistent values of Lagrange multipliers based 
on the principle of phase transitions and symmetry breaking. 

Generalized Statistics Modei - Endmember Percentage 



Boitzmann-Gibbs-Stiannon l^odei - Endmember Percentage 




Number of iterations 



Fig. I. Source separation for generalized statistics model 
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